Linear Complexity Context-Free Parsing Pipelines via Chart Constraints
نویسندگان
چکیده
In this paper, we extend methods from Roark and Hollingshead (2008) for reducing the worst-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state tagging pre-processing. Methods from our previous paper achieved quadratic worst-case complexity. We prove here that alternate methods for choosing constraints can achieve either linear or O(N logN) complexity. These worst-case bounds on processing are demonstrated to be achieved without reducing the parsing accuracy, in fact in some cases improving the accuracy. The new methods achieve observed performance comparable to the previously published quadratic complexity method. Finally, we demonstrate improved performance by combining complexity bounding methods with additional high precision constraints.
منابع مشابه
Finite-State Chart Constraints for Reduced Complexity Context-Free Parsing Pipelines
We present methods for reducing the worst-case and typical-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state pre-processing. We perform O(n) predictions to determine if each word in the input sentence may begin or end a multi-word constituent in chart cells spanning two or more words, or allow single-word constituents in chart cells spanning the w...
متن کاملParsing with Soft and Hard Constraints on Dependency Length
In lexicalized phrase-structure or dependency parses, a word’s modifiers tend to fall near it in the string. We show that a crude way to use dependency length as a parsing feature can substantially improve parsing speed and accuracy in English and Chinese, with more mixed results on German. We then show similar improvements by imposing hard bounds on dependency length and (additionally) modelin...
متن کاملEfficient Parsing of Well-Nested Linear Context-Free Rewriting Systems
The use of well-nested linear context-free rewriting systems has been empirically motivated for modeling of the syntax of languages with discontinuous constituents or relatively free word order. We present a chart-based parsing algorithm that asymptotically improves the known running time upper bound for this class of rewriting systems. Our result is obtained through a linear space construction...
متن کاملClassifying Chart Cells for Quadratic Complexity Context-Free Inference
In this paper, we consider classifying word positions by whether or not they can either start or end multi-word constituents. This provides a mechanism for “closing” chart cells during context-free inference, which is demonstrated to improve efficiency and accuracy when used to constrain the wellknown Charniak parser. Additionally, we present a method for “closing” a sufficient number of chart ...
متن کاملScanning and Parsing Languages with Ambiguities and Constraints: The Lamb and Fence Algorithms
Traditional language processing tools constrain language designers to specific kinds of grammars. In contrast, model-based language processing tools decouple language design from language processing. These tools allow the occurrence of lexical and syntactic ambiguities in language specifications and the declarative specification of constraints for resolving them. As a result, these techniques r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009